What is claimed is: 

1 1 . A computer program product for efficiently extracting data from a data stream, the 

2 computer program product embodied on one or more computer-readable media and comprising: 

3 computer-readable program code means for defining one or more data extraction rules, 

4 each of the rules comprising one or more rule components; 

5 computer-readable program code means for defining one or more output document 

6 templates for storing extracted data, wherein each of the templates comprises one or more tags 

7 which are hierarchically structured and wherein each template is to be associated with one or 

8 more of the data extraction rules; 

p9 computer-readable program code means for associating at least one of the templates with 

10 at least one of the rules; 

: |l computer-readable program code means for storing the rules, the templates, and the 

32 associations; 

□3 computer-readable program code means for monitoring at least one data stream for arrival 

Cl4 of incoming data; 

Cj 5 computer-readable program code means for comparing the incoming data to selected ones 

16 of the stored rules until detecting a matching rule; 

17 computer-readable program code means for extracting data from the incoming data, upon 

1 8 detecting the matching rule, according to the matching rule; and 

19 computer-readable program code means for storing the extracted data in an extensible 

20 document which is created according to the tags and structure of a selected one of the templates 

2 1 that is associated with the matching rule. 
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1 2. The computer program product according to Claim 1, wherein the computer-readable 

2 program code means for associating further comprises computer-readable program code means 

3 for associating the rule components of a particular rule with the tags of a particular template. 

1 3. The computer program product according to Claim 1, further comprising computer- 

2 readable program code means for transforming the extracted data in the extensible document into 

3 another notation. 

% 4. The computer program product according to Claim 1, fiirther comprising computer- 

i readable program code means for transforming the extracted data in the extensible document into 

j|3 another format. 

□l 5. The computer program product according to Claim 1, wherein the extensible document is 

l {y2 an Extensible Markup Language ( 4C XML 5? ) document. 

1 6. The computer program product according to Claim 1 , wherein the components of selected 

2 ones of the rules specify textual patterns. 

1 7. The computer program product according to Claim 1 , wherein the components of selected 

2 ones of the rules specify data element and attribute patterns. 
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1 8. The computer program product according to Claim 1 , wherein the components of selected 

2 ones of the rules specify a combination of textual patterns and data element and attribute patterns. 

1 9. A system for efficiently extracting data from a data stream, comprising: 

2 means for defining one or more data extraction rules, each of the rules comprising one or 

3 more rule components; 

4 means for defining one or more output document templates for storing extracted data, 

5 wherein each of the templates comprises one or more tags which are hierarchically structured and 

6 wherein each template is to be associated with one or more of the data extraction rules; 
••^7 means for associating at least one of the templates with at least one of the rules; 
s r s 8 means for storing the rules, the templates, and the associations; 

V:9 means for monitoring at least one data stream for arrival of incoming data; 

4o means for comparing the incoming data to selected ones of the stored rules until detecting 

H 1 a matching rule; 

h| 2 means for extracting data from the incoming data, upon detecting the matching rule, 

3 3 according to the matching rule; and 

14 means for storing the extracted data in an extensible document which is created according 

1 5 to the tags and structure of a selected one of the templates that is associated with the matching 

16 rule. 
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10. The system according to Claim 9, wherein the means for associating further comprises 
means for associating the rule components of a particular rule with the tags of a particular 
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3 template. 



1 11. The system according to Claim 9, further comprising means for transforming the extracted 

2 data in the extensible document into another notation. 

1 12. The system according to Claim 9, further comprising means for transforming the extracted 

2 data in the extensible document into another format. 

1 13. The system according to Claim 9, wherein the extensible document is an Extensible 

% Markup Language ("XML") document. 

14. The system according to Claim 9, wherein the components of selected ones of the rules 
specify textual patterns. 

%l 15. The system according to Claim 9, wherein the components of selected ones of the rules 
Zl2 specify data element and attribute patterns. 

1 16. The system according to Claim 9, wherein the components of selected ones of the rules 

2 specify a combination of textual patterns and data element and attribute patterns. 
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17. A method for efficiently extracting data from a data stream, comprising the steps of: 

defining one or more data extraction rules, each of the rules comprising one or more rule 
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3 components; 

4 defining one or more output document templates for storing extracted data, wherein each 

5 of the templates comprises one or more tags which are hierarchically structured and wherein each 

6 template is to be associated with one or more of the data extraction rules; 

7 associating at least one of the templates with at least one of the rules; 

8 storing the rules, the templates, and the associations; 

9 monitoring at least one data stream for arrival of incoming data; 

10 comparing the incoming data to selected ones of the stored rules until detecting a 

1 1 matching rule; 

'Tj extracting data from the incoming data, upon detecting the matching rule, according to the 

L l|3 matching rule; and 

M storing the extracted data in an extensible document which is created according to the tags 

15 and structure of a selected one of the templates that is associated with the matching rule. 



hi 18. The method according to Claim 17, wherein the associating step further comprises the 

p2 step of associating the rule components of a particular rule with the tags of a particular template. 

1 19. The method according to Claim 17, further comprising the step of transforming the 

2 extracted data in the extensible document into another notation. 
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20. The method according to Claim 17, further comprising the step of transforming the 
extracted data in the extensible document into another format. 
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1 21. The method according to Claim 1 7, wherein the extensible document is an Extensible 

2 Markup Language ("XML") document. 

1 22. The method according to Claim 17, wherein the components of selected ones of the rules 

2 specify textual patterns. 

1 23. The method according to Claim 17, wherein the components of selected ones of the rules 

2 specify data element and attribute patterns. 

s ';i 24. The method according to Claim 17, wherein the components of selected ones of the rules 

1 : 2 specify a combination of textual patterns and data element and attribute patterns. 

Cft 25. The method according to Claim 17, wherein the data stream is a legacy host stream 

^£2 containing one or more presentation spaces. 

1 26. The method according to Claim 17, wherein the data stream is sent between peer 

2 applications. 

1 27. The method according to Claim 26, wherein the data stream contains one or more 

2 Extensible Markup Language ("XML") documents. 
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2 pages. 



The method according to Claim 17, wherein the data stream contains one or more Web 
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